146 research outputs found
Where Have You Been? Secure Location Provenance for Mobile Devices
With the advent of mobile computing, location-based services have recently
gained popularity. Many applications use the location provenance of users,
i.e., the chronological history of the users' location for purposes ranging
from access control, authentication, information sharing, and evaluation of
policies. However, location provenance is subject to tampering and collusion
attacks by malicious users. In this paper, we examine the secure location
provenance problem. We introduce a witness-endorsed scheme for generating
collusion-resistant location proofs. We also describe two efficient and
privacy-preserving schemes for protecting the integrity of the chronological
order of location proofs. These schemes, based on hash chains and Bloom filters
respectively, allow users to prove the order of any arbitrary subsequence of
their location history to auditors. Finally, we present experimental results
from our proof-of-concept implementation on the Android platform and show that
our schemes are practical in today's mobile devices.Comment: 14 page
LifeRaft: Data-Driven, Batch Processing for the Exploration of Scientific Databases
Workloads that comb through vast amounts of data are gaining importance in
the sciences. These workloads consist of "needle in a haystack" queries that
are long running and data intensive so that query throughput limits
performance. To maximize throughput for data-intensive queries, we put forth
LifeRaft: a query processing system that batches queries with overlapping data
requirements. Rather than scheduling queries in arrival order, LifeRaft
executes queries concurrently against an ordering of the data that maximizes
data sharing among queries. This decreases I/O and increases cache utility.
However, such batch processing can increase query response time by starving
interactive workloads. LifeRaft addresses starvation using techniques inspired
by head scheduling in disk drives. Depending upon the workload saturation and
queuing times, the system adaptively and incrementally trades-off processing
queries in arrival order and data-driven batch processing. Evaluating LifeRaft
in the SkyQuery federation of astronomy databases reveals a two-fold
improvement in query throughput.Comment: CIDR 200
Optimize Unsynchronized Garbage Collection in an SSD Array
Solid state disks (SSDs) have advanced to outperform traditional hard drives
significantly in both random reads and writes. However, heavy random writes
trigger fre- quent garbage collection and decrease the performance of SSDs. In
an SSD array, garbage collection of individ- ual SSDs is not synchronized,
leading to underutilization of some of the SSDs.
We propose a software solution to tackle the unsyn- chronized garbage
collection in an SSD array installed in a host bus adaptor (HBA), where
individual SSDs are exposed to an operating system. We maintain a long I/O
queue for each SSD and flush dirty pages intelligently to fill the long I/O
queues so that we hide the performance imbalance among SSDs even when there are
few parallel application writes. We further define a policy of select- ing
dirty pages to flush and a policy of taking out stale flush requests to reduce
the amount of data written to SSDs. We evaluate our solution in a real system.
Experi- ments show that our solution fully utilizes all SSDs in an array under
random write-heavy workloads. It improves I/O throughput by up to 62% under
random workloads of mixed reads and writes when SSDs are under active garbage
collection. It causes little extra data writeback and increases the cache hit
rate
Active Community Detection in Massive Graphs
A canonical problem in graph mining is the detection of dense communities.
This problem is exacerbated for a graph with a large order and size -- the
number of vertices and edges -- as many community detection algorithms scale
poorly. In this work we propose a novel framework for detecting active
communities that consist of the most active vertices in massive graphs. The
framework is applicable to graphs having billions of vertices and hundreds of
billions of edges. Our framework utilizes a parallelizable trimming algorithm
based on a locality statistic to filter out inactive vertices, and then
clusters the remaining active vertices via spectral decomposition on their
similarity matrix. We demonstrate the validity of our method with synthetic
Stochastic Block Model graphs, using Adjusted Rand Index as the performance
metric. We further demonstrate its practicality and efficiency on a most recent
real-world Hyperlink Web graph consisting of over 3.5 billion vertices and 128
billion edges.Comment: published in SDM-Networks 201
The Life and Death of Unwanted Bits: Towards Proactive Waste Data Management in Digital Ecosystems
Our everyday data processing activities create massive amounts of data. Like
physical waste and trash, unwanted and unused data also pollutes the digital
environment by degrading the performance and capacity of storage systems and
requiring costly disposal. In this paper, we propose using the lessons from
real life waste management in handling waste data. We show the impact of waste
data on the performance and operational costs of our computing systems. To
allow better waste data management, we define a waste hierarchy for digital
objects and provide insights into how to identify and categorize waste data.
Finally, we introduce novel ways of reusing, reducing, and recycling data and
software to minimize the impact of data wastageComment: Fixed reference
Gradient-Domain Processing for Large EM Image Stacks
We propose a new gradient-domain technique for processing registered EM image
stacks to remove the inter-image discontinuities while preserving intra-image
detail. To this end, we process the image stack by first performing anisotropic
diffusion to smooth the data along the slice axis and then solving a
screened-Poisson equation within each slice to re-introduce the detail. The
final image stack is both continuous across the slice axis (facilitating the
tracking of information between slices) and maintains sharp details within each
slice (supporting automatic feature detection). To support this editing, we
describe the implementation of the first multigrid solver designed for
efficient gradient domain processing of large, out-of-core, voxel grids
knor: A NUMA-Optimized In-Memory, Distributed and Semi-External-Memory k-means Library
k-means is one of the most influential and utilized machine learning
algorithms. Its computation limits the performance and scalability of many
statistical analysis and machine learning tasks. We rethink and optimize
k-means in terms of modern NUMA architectures to develop a novel
parallelization scheme that delays and minimizes synchronization barriers. The
\textit{k-means NUMA Optimized Routine} (\textsf{knor}) library has (i)
in-memory (\textsf{knori}), (ii) distributed memory (\textsf{knord}), and (iii)
semi-external memory (\textsf{knors}) modules that radically improve the
performance of k-means for varying memory and hardware budgets. \textsf{knori}
boosts performance for single machine datasets by an order of magnitude or
more. \textsf{knors} improves the scalability of k-means on a memory budget
using SSDs. \textsf{knors} scales to billions of points on a single machine,
using a fraction of the resources that distributed in-memory systems require.
\textsf{knord} retains \textsf{knori}'s performance characteristics, while
scaling in-memory through distributed computation in the cloud. \textsf{knor}
modifies Elkan's triangle inequality pruning algorithm such that we utilize it
on billion-point datasets without the significant memory overhead of the
original algorithm. We demonstrate \textsf{knor} outperforms distributed
commercial products like HO, Turi (formerly Dato, GraphLab) and Spark's
MLlib by more than an order of magnitude for datasets of to
points
FlashR: R-Programmed Parallel and Scalable Machine Learning using SSDs
R is one of the most popular programming languages for statistics and machine
learning, but the R framework is relatively slow and unable to scale to large
datasets. The general approach for speeding up an implementation in R is to
implement the algorithms in C or FORTRAN and provide an R wrapper. FlashR takes
a different approach: it executes R code in parallel and scales the code beyond
memory capacity by utilizing solid-state drives (SSDs) automatically. It
provides a small number of generalized operations (GenOps) upon which we
reimplement a large number of matrix functions in the R base package. As such,
FlashR parallelizes and scales existing R code with little/no modification. To
reduce data movement between CPU and SSDs, FlashR evaluates matrix operations
lazily, fuses operations at runtime, and uses cache-aware, two-level matrix
partitioning. We evaluate FlashR on a variety of machine learning and
statistics algorithms on inputs of up to four billion data points. FlashR
out-of-core tracks closely the performance of FlashR in-memory. The R code for
machine learning algorithms executed in FlashR outperforms the in-memory
execution of H2O and Spark MLlib by a factor of 2-10 and outperforms Revolution
R Open by more than an order of magnitude
Gradient-Domain Fusion for Color Correction in Large EM Image Stacks
We propose a new gradient-domain technique for processing registered EM image
stacks to remove inter-image discontinuities while preserving intra-image
detail. To this end, we process the image stack by first performing anisotropic
smoothing along the slice axis and then solving a Poisson equation within each
slice to re-introduce the detail. The final image stack is continuous across
the slice axis and maintains sharp details within each slice. Adapting existing
out-of-core techniques for solving the linear system, we describe a parallel
algorithm with time complexity that is linear in the size of the data and space
complexity that is sub-linear, allowing us to process datasets as large as five
teravoxels with a 600 MB memory footprint
An SSD-based eigensolver for spectral analysis on billion-node graphs
Many eigensolvers such as ARPACK and Anasazi have been developed to compute
eigenvalues of a large sparse matrix. These eigensolvers are limited by the
capacity of RAM. They run in memory of a single machine for smaller eigenvalue
problems and require the distributed memory for larger problems.
In contrast, we develop an SSD-based eigensolver framework called FlashEigen,
which extends Anasazi eigensolvers to SSDs, to compute eigenvalues of a graph
with hundreds of millions or even billions of vertices in a single machine.
FlashEigen performs sparse matrix multiplication in a semi-external memory
fashion, i.e., we keep the sparse matrix on SSDs and the dense matrix in
memory. We store the entire vector subspace on SSDs and reduce I/O to improve
performance through caching the most recent dense matrix. Our result shows that
FlashEigen is able to achieve 40%-60% performance of its in-memory
implementation and has performance comparable to the Anasazi eigensolvers on a
machine with 48 CPU cores. Furthermore, it is capable of scaling to a graph
with 3.4 billion vertices and 129 billion edges. It takes about four hours to
compute eight eigenvalues of the billion-node graph using 120 GB memory
- …